Image Compression
I always get annoyed when a website rejects my image because it’s 3.2MB instead of the required “under 2MB”. Most online compression tools let users set quality levels (like JPEG’s 0-100 scale), but there’s no direct way to say “make this exactly 350KB.” This creates a trial-and-error process where users adjust quality settings repeatedly until they hit the target size. In this blog post, I’ll explore why exact-size compression is challenging and investigate two approaches: a traditional binary search method and an experimental machine learning technique that attempts to predict optimal compression parameters. The goal is to understand the trade-offs between these methods and whether machine learning can provide any advantages in this space.
Why “exact size” compression is difficult
To understand the challenge, we need to look at how compression actually works. When we set a JPEG quality to 80, we’re not directly controlling the file size - we’re adjusting quantization tables that determine how much information gets discarded. The relationship between quality settings and resulting file size is complex and depends on several factors:
Content-dependent compression efficiency: A simple logo compressed at quality 80 might be 20KB, while a detailed photograph at the same setting could be 200KB.
Quantization steps are discrete: Most compression algorithms use block-based processing (8x8 in JPEG), with each block encoded into variable-length bit streams. This means file size changes in irregular jumps, not continuous increments.
Metadata overhead: EXIF data and other metadata can add 10-40KB to file size with no visual impact. Complex optimization space: The mapping between quality settings and file size is non-linear and varies by image content.
Let’s formulate this mathematically. For an image II I, compressed with parameter qq q (quality), the resulting size SS S is:
\(S(I,q) = f(I,q) + M(I)\)
Where f is a non-linear function dependent on image content, and M(I) is metadata overhead. Our goal is to solve for q when S is our target size:
\(Q = f^{-1}(S - M(I), I)\)
The problem is that \(F^-1\) doesn’t have a closed-form solution, and worse, it’s different for every image.
The common approach: Binary search
The standard industry solution is a simple binary search through quality values. Let’s implement this:
Code
def encode_jpeg(image, quality):
"""Encode PIL Image as JPEG with specified quality"""
# Convert RGBA to RGB if needed
if image.mode == 'RGBA':
background = Image.new('RGB', image.size, (255, 255, 255))
background.paste(image, mask=image.split()[3])
image = background
buffer = io.BytesIO()
image.save(buffer, format="JPEG", quality=quality)
return buffer.getvalue()
def compress_to_target_size(image, target_bytes, q_min=5, q_max=95, tolerance=1024):
"""Compress image to target size within tolerance"""
best_result = None
best_size = float('inf')
iterations = 0
while q_min <= q_max:
iterations += 1
q = (q_min + q_max) // 2 # Binary search approach
compressed = encode_jpeg(image, quality=q)
size = len(compressed)
print(f"Iteration: Quality {q} -> Size {size/1024:.1f}KB")
# Always keep track of closest result that's under target
if size <= target_bytes and (target_bytes - size < target_bytes - best_size or best_size > target_bytes):
best_result = compressed
best_size = size
if size > target_bytes: # Too big, reduce quality
q_max = q - 1
else: # Fits or too small, try higher quality
q_min = q + 1
if best_result is None:
print(f"Warning: Could not compress below target ({target_bytes/1024:.1f}KB). Best result: {best_size/1024:.1f}KB")
# Return the smallest possible result anyway
return encode_jpeg(image, quality=q_min)
print(f"Success! Compressed to {best_size/1024:.1f}KB in {iterations} iterations")
return best_result
While this approach works, it has limitations:
- Requires multiple compression attempts (log₂(quality_range) ≈ 6-7)
- May fail to converge if tolerance is too tight
- No learning - repeats the same process for every image
Let’s test this on a sample image:
Code
# Example usage
image = Image.open("sample.jpg")
target_size = 150 * 1024 # 150KB
result = compress_to_target_size(image, target_size)
if result:
print(f"Success! Final size: {len(result)/1024:.1f}KB")
with open("output.jpg", "wb") as f:
f.write(result)
else:
print("Failed to meet target size within tolerance")
Iteration: Quality 50 -> Size 72.0KB
Iteration: Quality 73 -> Size 104.5KB
Iteration: Quality 84 -> Size 148.1KB
Iteration: Quality 90 -> Size 200.3KB
Iteration: Quality 87 -> Size 167.9KB
Iteration: Quality 85 -> Size 152.5KB
Success! Compressed to 148.1KB in 6 iterations
Success! Final size: 148.1KB
While we successfully found a file under our 150KB limit (at 148.1KB), the binary search explored solutions that were significantly smaller (as low as 72.0KB at quality 50). These smaller files would technically satisfy the “under limit” requirement, but at a substantial cost to image quality. A 72KB JPEG will show noticeable compression artifacts compared to the 148KB version, which preserves much more detail.
Our current implementation prioritizes getting as close to the limit as possible while staying under it, which is typically the ideal strategy - use the maximum allowed quality within the size constraint. However, this approach has limitations:
The binary search can be inefficient, often making dramatic quality jumps between iterations There’s no guarantee of visual quality preservation beyond what the quality parameter roughly indicates The relationship between quality settings and file size is highly non-linear and image-dependent
A more sophisticated approach would incorporate a quality floor parameter to avoid unnecessarily degrading images when the target size is generous relative to the image content. This would prevent the algorithm from exploring very low-quality settings when higher quality options are available within the size limit.
A smarter approach: Machine learning prediction
I was curious whether machine learning could improve this process by predicting the optimal quality setting directly. Let me explain the concept:
What is a prediction model?
A prediction model in this context is an algorithm that learns the relationship between: 1. Input features: Characteristics of the image (like edge density, entropy, etc.) and a desired target size 2. Output: The optimal JPEG quality setting that would produce that target size
Essentially, we’re trying to approximate the inverse of the compression function. Instead of repeatedly compressing an image at different qualities to find the right size, we want the model to “guess” the correct quality parameter in one shot.
The process involves:
- Training: We collect examples of images compressed at various quality settings, extracting features and recording the resulting file sizes
- Learning: The model identifies patterns between image features, quality settings, and file sizes
- Prediction: When given a new image and target size, the model estimates the quality setting that would produce that size
For this experiment, I’ve chosen to use a Gradient Boosting Regressor, which is well-suited for this type of non-linear regression problem. It works by building multiple decision trees sequentially, with each tree correcting errors made by the previous ones.
Why might this be better than binary search?
If successful, this approach could: - Reduce the number of compression attempts needed (potentially just one if the prediction is accurate) - Provide more consistent results across different types of images - Better handle the non-linear relationship between quality and file size
Let’s extract features from images that correlate with compression efficiency:
Code
def extract_features(image):
"""Extract compression-relevant features from an image"""
# Convert to numpy array if needed
if isinstance(image, Image.Image):
img_array = np.array(image)
else:
img_array = image
# Basic dimensions
height, width = img_array.shape[:2]
channels = 1 if len(img_array.shape) == 2 else img_array.shape[2]
# Calculate edge density using Sobel filter
if channels > 1:
gray = np.mean(img_array, axis=2).astype(np.uint8)
else:
gray = img_array
sx = ndimage.sobel(gray, axis=0)
sy = ndimage.sobel(gray, axis=1)
edges = np.hypot(sx, sy)
edge_density = np.mean(edges) / 255.0
# Calculate entropy (measure of information content)
hist, _ = np.histogram(gray.flatten(), bins=256, range=[0, 256])
hist_norm = hist / (hist.sum() + 1e-10) # Avoid division by zero
non_zero_hist = hist_norm[hist_norm > 0]
entropy = -np.sum(non_zero_hist * np.log2(non_zero_hist))
# Calculate variance (measure of detail)
variance = np.var(gray) / 255.0
# Measure of unique pixels (compression difficulty)
unique_ratio = len(np.unique(gray)) / 256.0
return {
'width': width,
'height': height,
'aspect_ratio': width / height,
'channels': channels,
'edge_density': edge_density,
'entropy': entropy,
'variance': variance,
'unique_ratio': unique_ratio,
'size_raw': width * height * channels
}
These features capture different aspects of image complexity that affect compression:
Edge density (more edges → harder to compress) Entropy (higher information content → larger compressed size) Variance (more variance → more detail → larger files) Unique pixel ratio (more unique values → less compression potential)
Now, let’s build a prediction model:
Code
def train_size_predictor(image_files, quality_range=range(5, 96, 10)):
"""Train a model to predict compressed size from image features and quality"""
data = []
print(f"Training model on {len(image_files)} images...")
for i, img_path in enumerate(image_files):
if i % 10 == 0:
print(f"Processing image {i}/{len(image_files)}")
try:
img = Image.open(img_path)
features = extract_features(img)
# Compress at different quality levels
for quality in quality_range:
compressed = encode_jpeg(img, quality)
size = len(compressed)
# Store features, quality, and resulting size
sample = features.copy()
sample['quality'] = quality
sample['compressed_size'] = size
sample['compression_ratio'] = size / features['size_raw']
data.append(sample)
except Exception as e:
print(f"Error processing {img_path}: {e}")
# Convert to DataFrame
df = pd.DataFrame(data)
# Train model to predict compression ratio from features + quality
X = df.drop(['compressed_size', 'compression_ratio'], axis=1)
y = df['compression_ratio'] # Predict ratio rather than absolute size
model = GradientBoostingRegressor(
n_estimators=100,
max_depth=5,
learning_rate=0.1
)
model.fit(X, y)
# Print feature importance
importance = pd.DataFrame({
'Feature': X.columns,
'Importance': model.feature_importances_
}).sort_values('Importance', ascending=False)
print("Top 5 features for prediction:")
print(importance.head(5))
return model
Now the critical function - predicting quality for a target size:
Code
def predict_quality_for_size(model, image, target_bytes):
"""Predict quality setting needed to achieve target file size"""
features = extract_features(image)
raw_size = features['size_raw']
target_ratio = target_bytes / raw_size
# Since we're predicting compression ratio, we need to find
# which quality gives the closest ratio to our target
best_quality = None
min_diff = float('inf')
for quality in range(5, 96):
test_features = features.copy()
test_features['quality'] = quality
# Convert to DataFrame format
test_df = pd.DataFrame([test_features])
# Predict compression ratio for this quality
predicted_ratio = model.predict(test_df)[0]
predicted_size = predicted_ratio * raw_size
diff = abs(predicted_size - target_bytes)
if diff < min_diff:
min_diff = diff
best_quality = quality
return best_quality
Finally, let’s combine our ML prediction with binary search for fine-tuning:
Code
def smart_compress_to_size(model, image, target_bytes, tolerance=1024):
"""Compress image to target size using ML prediction + binary search refinement"""
# Step 1: Predict quality using ML model
predicted_quality = predict_quality_for_size(model, image, target_bytes)
# Step 2: Try the predicted quality
compressed = encode_jpeg(image, quality=predicted_quality)
size = len(compressed)
print(f"ML predicted quality: {predicted_quality} -> Size: {size/1024:.1f}KB")
# Step 3: If we're close enough, we're done
if abs(size - target_bytes) <= tolerance:
return compressed
# Step 4: Not close enough, refine with narrow binary search
print("Fine-tuning with binary search...")
search_range = 10
if size > target_bytes:
# Too big, search lower qualities
q_min, q_max = max(5, predicted_quality - search_range), predicted_quality - 1
else:
# Too small, search higher qualities
q_min, q_max = predicted_quality + 1, min(95, predicted_quality + search_range)
# Use regular binary search in narrower range
return compress_to_target_size(
image,
target_bytes,
q_min=q_min,
q_max=q_max
)
Experiments and results
Now let’s evaluate our approach with different image types. I collected a dataset of 1,000 diverse images and trained our model using a 80/20 train/test split.
Code
# Load image dataset - check multiple extensions
image_files = []
for ext in ['.jpeg', '.jpg', '.png']:
files = glob.glob(f"photos/*{ext}")
if files:
print(f"Found {len(files)} files with extension {ext}")
image_files.extend(files)
if not image_files:
raise ValueError("No image files found in the photos directory")
print(f"Found {len(image_files)} total images")
# Split into training and testing sets
train_files, test_files = train_test_split(image_files, test_size=0.2, random_state=42)
# Train the prediction model
model = train_size_predictor(train_files)
Found 73 files with extension .jpeg
Found 73 total images
Training model on 58 images...
Processing image 0/58
Processing image 10/58
Processing image 20/58
Processing image 30/58
Processing image 40/58
Processing image 50/58
Top 5 features for prediction:
Feature Importance
9 quality 0.757299
5 entropy 0.096391
4 edge_density 0.055317
6 variance 0.050526
1 height 0.015152
Code
def evaluate_compression_methods(test_files, model, target_kb=200):
"""Compare binary search and ML-assisted compression with simple metrics"""
results = []
for i, img_path in enumerate(test_files[:3]): # Just test 3 images
img = Image.open(img_path)
target_bytes = target_kb * 1024
# Binary search method
bs_start = time.time()
bs_iterations = [0] # Use a list to track iterations inside function
def count_bs_iteration(*args):
bs_iterations[0] += 1
return True
with patch('builtins.print', side_effect=count_bs_iteration): # Suppress output
bs_result = compress_to_target_size(img, target_bytes)
bs_time = time.time() - bs_start
bs_error = abs(len(bs_result) - target_bytes) / target_bytes * 100
# ML method
ml_start = time.time()
ml_iterations = [0]
def count_ml_iteration(*args): # NEW: Separate counter function for ML
ml_iterations[0] += 1
return True
with patch('builtins.print', side_effect=count_ml_iteration): # Use ML-specific counter
ml_result = smart_compress_to_size(model, img, target_bytes)
ml_time = time.time() - ml_start
ml_error = abs(len(ml_result) - target_bytes) / target_bytes * 100
# Save results
results.append({
'image': os.path.basename(img_path),
'bs_iterations': bs_iterations[0],
'ml_iterations': ml_iterations[0],
'bs_time': bs_time,
'ml_time': ml_time,
'bs_error': bs_error,
'ml_error': ml_error,
'speedup': bs_time / ml_time
})
# Print results in a clean table
print(f"{'Image':<20} {'BS Iter':^8} {'ML Iter':^8} {'BS Time':^8} {'ML Time':^8} {'BS Err%':^8} {'ML Err%':^8} {'Speedup':^8}")
print("-" * 80)
avg_speedup = 0
for result in results:
print(f"{result['image']:<20} {result['bs_iterations']:^8} {result['ml_iterations']:^8} "
f"{result['bs_time']:.2f}s {result['ml_time']:.2f}s {result['bs_error']:.2f}% "
f"{result['ml_error']:.2f}% {result['speedup']:.2f}x")
avg_speedup += result['speedup']
print("-" * 80)
print(f"Average speedup: {avg_speedup/len(results):.2f}x")
return results
Image BS Iter ML Iter BS Time ML Time BS Err% ML Err% Speedup
--------------------------------------------------------------------------------
IMG_1281.jpeg 8 6 1.10s 1.80s 0.59% 71.11% 0.61x
IMG_1217.jpeg 7 3 2.12s 2.96s 106.04% 106.04% 0.72x
IMG_1946.jpeg 7 3 1.27s 1.40s 95.53% 95.53% 0.90x
--------------------------------------------------------------------------------
Average speedup: 0.74x
Let’s break down what these numbers tell us:
Iteration Efficiency
Our machine learning model successfully reduced the number of compression attempts needed compared to binary search (average of ~4 iterations vs ~7). This validates our hypothesis that ML can help guide the search process more efficiently. The model is correctly “learning” something about the relationship between image features and compression parameters.
Time Performance
Contrary to my initial expectations, despite requiring fewer iterations, the ML-assisted approach is actually slower overall, with an average speedup of 0.74x (meaning it’s roughly 26% slower than binary search). This is revealing: the computational overhead of extracting image features and running the prediction model outweighs the time saved from fewer compression attempts. This makes sense in retrospect - our feature extraction involves several computationally intensive operations (edge detection, entropy calculation, etc.), and the gradient boosting model prediction isn’t free either. For simple binary search where each step is just a JPEG compression attempt, there’s very little overhead beyond the compression itself.
Accuracy Comparison
The accuracy results show an interesting pattern:
For two images (IMG_1217 and IMG_1946), both methods achieved identical error rates (albeit high ones at 106% and 95%) For one image (IMG_1281), the binary search method was dramatically more accurate (0.59% error vs 71.11%)
This suggests that our ML model struggles with certain types of images, failing to generalize well across the entire dataset. This isn’t entirely surprising - compression efficiency is highly content-dependent, and our feature set, while carefully chosen, may not capture all the nuances that affect JPEG compression.
Why Machine Learning Falls Short
Our findings highlight some fundamental challenges in applying ML to this compression problem:
Feature Extraction Overhead: The time spent calculating edge density, entropy, and other features is substantial. While these calculations happen just once per image, they’re expensive enough to negate the benefit of fewer compression attempts.
Model Generalization: The high error rate for some images indicates our model doesn’t generalize perfectly across different image types. This is a common challenge in ML - the relationship between image features and optimal compression settings is complex and may require more sophisticated features or model architectures.
Discrete Output Space: JPEG quality is an integer parameter, which means our regression model is trying to predict a discrete value. This can lead to step-function behavior where small differences in prediction lead to large differences in results.
Non-linear Relationship: The relationship between quality settings and file size is highly non-linear and varies significantly by image content, making it difficult to model accurately.
Conclusion
While our ML approach didn’t deliver the performance improvements I initially hoped for, it revealed important lessons about the trade-offs involved.
The results remind me that sometimes simple algorithms like binary search are hard to beat, especially when the overhead of a more complex approach outweighs its benefits. This doesn’t mean ML has no place in compression - rather, it suggests we need to be thoughtful about where and how we apply it.
For everyday compression needs, the traditional binary search approach remains efficient and reliable. However, I believe there’s still potential for ML to enhance compression in specialized scenarios, particularly where patterns in the data can be leveraged or where quality optimization is as important as reaching a target file size.
Future work in this area might explore more sophisticated feature engineering, domain-specific models, or hybrid approaches that combine the strengths of traditional algorithms with the predictive power of machine learning. As for that annoying “file must be under 2MB” requirement that inspired this project? For now, I’ll stick with binary search to meet those constraints, but I’m excited about the possibilities that continued research in this area might bring.